Compensating for vowel coarticulation in continuous speech recognition

نویسنده

  • James Hieronymus
چکیده

Coarticulation alters the vowel formant characteristics in continuous speech. Studies of isolated monosyilables in the Iiterature suggest that some phonemes cause more severe distortions than others. The largest changes are caused by /r /, /1/, /w /. Unstressed vowels are most affected. Previous studies by Holmes (6] and by us (13] indicate that these effects are even larger for continuous speech. Vowel recognition algorithms which do not take context into account in continuous speech normally achieve correct recognition of approximately 75 % for the three top choices from the recognizer. By developing methods which explicitly model the phonetic context, higher Ievels of performance are expected to be achieved. An ongoing study is being made of all 16 of the American English vowels using a subset the DARPA acoustic-phonetic data base, a phonetically labeled 6300 sentence data base with 630 talkers. A subset of 7 vowels /iy/, /1/, /eh/, /ae/, /o/ and /u/ have been studied in all major contexts. The formant temporal patterns are being examined for phoneme triples and quintuples with a vowel in the center. These formant patterns are discussed along with some effects of stress and speaking rate. Introduction Early studies of vowels by Potter and Steinberg (1] and Peterson and Barney (2] used isolated monosyllabic words to provide canonical formant frequencies for each monothong vowel in American English. Given a neutral h_d context, the vowels dustered weil in the space of the first and second formant, across hundreds of talkers. Vowels were believed to be characterized by the steady state frequencies of the first two or three formants. This view was challenged by some perceptual studies by Fairbanksand Grubb (3] for American English and Fujimura and Ochiai (4] for Japanese. They showed that the steady state portions of the vowels excised from the words were less identifiable than the whole words. In some way vowel context is very important in disambiguating vowels. Strange et al (5] showed that by listening to just the transitions into and out of the vowels with silence replacing the usual vowel, listeners were able to identify the vowel almost as weil as when played the whole word. An important cue to identification in this experimentwas duration. With duration not available as a cue, the vowel-less vowel identification accuracy feil to somewhere between isolated vowels and whole word. An early study by Shearme and Holmes (6] showed that vowels in continuous speech very seldom had steady states and often did not overlap the Peterson-Barney contours in any part of their frequency trajectories in time. Generally the vowels are much more centralized in continuous speech and the vowel formant regions overlap considerably due to coarticulation. The durations of vowels are Ionger for isolated words than for continuous speech. Vowels in front of pauses are also lengthened. One possible hypothesis from this data is that vowels in continuous speech are more identifiable than vowels in words bounded by silence. Presumably this is because a great deal of information is carried in the formant transitions. Strange et al (5] have advanced the theory that vowels are completely determined by the formant transitions into and out of the vowel. Nearey and Assmann (7] have postulated three alternate possible explanations for vowel perception, dual formant target, target plus slope and target plus direction. In their experiments they were unable to determine which mechanism was a clear best choice. Experiments of Gay [8] would seem to favor target plus slope, those of Pols [9] favor target plus direction for Dutch diphthongs. All of these studies were conducted on isolated words. The approach presented here is to attempt speaker independent vowel recognition based on formant frequencies and pitch. Speaker normalization using formant differences and F1 pitch as suggested by Syrdal and Gopal (10] has been studied for this data. The second source of variation * Institute for Computer Seiences and Technology, Building 225 Room A216, U. S. National Bureau of Standards, Gaithersburg, Md. 20899, U .S.A.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic modeling for spontaneous speech recognition using syllable dependent models

This paper proposes a syllable context dependent model for spontaneous speech recognition. It is generally assumed that, since spontaneous speech is greatly affected by coarticulation, an acoustic model featuring a longer range phonemic context is required to achieve a high degree of recognition accuracy. This motivated the authors to investigate a tri-syllable model that takes differences in t...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

The Effect of Stress and Speech Rate on Vowel Coarticulation in Catalan Vowel-Consonant-Vowel Sequences.

PURPOSE The goal of this study was to ascertain the effect of changes in stress and speech rate on vowel coarticulation in vowel-consonant-vowel sequences. METHOD Data on second formant coarticulatory effects as a function of changing /i/ versus /a/ were collected for five Catalan speakers' productions of vowel-consonant-vowel sequences with the fixed vowels /i/ and /a/ and consonants: the ap...

متن کامل

Modeling between-word coarticulation in continuous speech recognition

This paper describes the addition of between-word coarticulation modeling into SPHINX, an accurate Iarge-vocabulary speakerindependent continuous speech recognition system. Between-word coarticulation is a major source of phonetic variability in continuous speech. By detailed modeling of between-word triphones and utilizing the generalized triphone technique, we obtain an error ;ate reduction o...

متن کامل

The perceptual effects of child-adult differences in fricative-vowel coarticulation.

Earlier work [Nittrouer et al., J. Speech Hear. Res. 32, 120-132 (1989)] demonstrated greater evidence of coarticulation in the fricative-vowel syllables of children than in those of adults when measured by anticipatory vowel effects on the resonant frequency of the fricative back cavity. In the present study, three experiments showed that this increased coarticulation led to improved vowel rec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1987